Feature Selection with Linked Data in Social Media
نویسندگان
چکیده
Feature selection is widely used in preparing highdimensional data for effective data mining. Increasingly popular social media data presents new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attribute-value data such as posts, tweets, comments, and images, and (2) linked data that describes the relationships between social media users as well as who post the posts, etc. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this paper, we illustrate the differences between attributevalue data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories, extensively evaluate the effects of user-user and user-post relationships manifested in linked data on feature selection, and discuss some research issues for future work.
منابع مشابه
Linked Unsupervised Based Advanced Feature Selection Framework with Artificial Bee Colony for Social Media Data
The explosive usage of social media produces large amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. Existing work investi...
متن کاملCoSelect: Feature Selection with Instance Selection for Social Media Data
Feature selection is widely used in preparing highdimensional data for effective data mining. Attributevalue data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drasti...
متن کاملIntegrating Social Network Structure into Online Feature Selection
Short-texts accentuate the challenges posed by the high feature space dimensionality of text learning tasks. The linked nature of social data causes new dimensions to be added to the feature space, which, also becomes sparser. Thus, efficient and scalable online feature selection becomes a crucial requirement of numerous large-scale social applications. This thesis proposes an online feature se...
متن کاملPublications CIKM 2016
Publications: CIKM 2016 Justin Sampson, Fred Morstatter, Liang Wu and Huan Liu. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection", short paper. Suhang Wang , Jiliang Tang, Fred Morstatter and Huan Liu. "Paired Restricted Boltzmann Machine for Linked Data. Suhang Wang , Jiliang Tang, Charu Aggarwal and Huan Liu. "Linked Document Embedding for Classification. Ke...
متن کاملWhat Is Good for One City May Not Be Good for Another One: Evaluating Generalization for Tweet Classification Based on Semantic Abstraction
Social media is a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity. However, those most often are focused on regionally restricted datasets such as data from only one city. The important fact that social media data such as tweets varies considerably across different cities is neglected. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012